Reading list for Thu, Jan.18, 2024

Total: 704 from which: 3 suggested today and 412 expired

Today's reading list ±

3 links selected from 292 today Thu, Jan.18, 2024
  1. Cheating is All You Need

    There is something legendary and historic happening in software engineering, right now as we speak, and yet most of you don’t realize at all how big it is. ... (24771 chars. See body)


    Heya. Sorry for not writing for so long. I’ll make up for it with 3000 pages here.

    I’m just hopping right now. That’s kinda the only way to get me to blog anymore.

    I’ve rewritten this post so many times. It’s about AI. But AI is changing so fast that the post is out of date within a few days. So screw it. I’m busting this version out in one sitting.

    (Spoiler alert: There’s some Sourcegraph stuff at the end, including a product plug and some recruiting stuff. But >80% of this post is just about LLMs–GPT etc.–and you, a programmer.)

    There is something legendary and historic happening in software engineering, right now as we speak, and yet most of you don’t realize at all how big it is.

    LLMs aren’t just the biggest change since social, mobile, or cloud–they’re the biggest thing since the World Wide Web. And on the coding front, they’re the biggest thing since IDEs and Stack Overflow, and may well eclipse them both.

    But most of the engineers I personally know are sort of squinting at it and thinking, “Is this another crypto?” Even the devs at Sourcegraph are skeptical. I mean, what engineer isn’t. Being skeptical is a survival skill.

    Remember I told you how my Amazon shares would have been worth $130 million USD today if I hadn’t been such a skeptic about how big Amazon was going to get, and unloaded them all back in 2004-ish. Right? I told you about that? I’m sure I mentioned it once or twice. Not that I am bitter. No.

    But did I ever tell you about the time AWS was just a demo on some engineer’s laptop? No? Well it was Ruben Ortega and Al Vermeulen. They were walking around the eng department at Amazon, showing their “web service” demo to anyone who’d watch it. This was back in maybe… 2003? Ish? They showed us how you could make a service call over the web, like by hitting a URL and sending the right query parameters.

    Well lo and behold we were skeptical. Why the hell would you make a service call over the web? That’s not what it was even designed for! Not to mention, it obviously wouldn’t perform as well as CORBA (Amazon’s stupid-ass RPC system at the time). The whole thing just didn’t make any sense to us.

    We were seeing the first little flecks of lava from what would become a trillion-dollar volcano of money called AWS and Cloud Computing.

    But a lot of us were skeptical. To most of us, those little lava flecks looked like fireflies.

    The ultra-rare trillion-dollar money volcano

    I could tell you a LOT of stories like the web-services one. Great big shit always starts life as a demo.

    What about chatting with people in a browser? Doesn’t matter whether you’re using Facebook, Google Chat, LinkedIn, or just chatting with a customer service agent: if you’re having a conversation with someone in a browser, all that shit started life as a teeny demo of 2 engineers sending messages back and forth over a “hanging GET” channel back in 2005. Entire industries were built on that one little channel, and it wasn’t even very good.

    What about Kubernetes? I remember seeing a demo of that early on, on Brendan Burns’ work laptop, when it was called mini-Borg. Entire industries are being built on Kubernetes, and it’s not even very good either. 😉 Or look at Docker! Something as innocuous as linux cgroups, a little process-isolation manager, became the technical foundation for containers, which now utterly pervade our industry.

    If you can build something as big as Amazon Web Services with a stack based on a simple service call, or whole social networks and customer service suites based on simple browser-to-browser communication, or a robust way of delivering and managing software based on a little process isolation code, then just imagine how big a thing you could build – bear with me here – if you had the goddamn Singularity as your starting point?

    I mean, I joke, but… I mean… Right? I’m guessing you prolly missed it in OpenAI’s 98-page GPT-4 technical report, but large models are apparently already prone to discovering that “power-seeking” is an effective strategy for increasing their own robustness. Open the PDF and search for “power-seeking” for a fun and totally 100% non-scary read.

    You can build truly massive things by building upon little technical breakthroughs.

    And folks, this technical breakthrough? It ain’t little.

    If you’re not pant-peeingly excited and worried about this yet, well… you should be.

    And yet the Mehs prevail

    We did an internal poll at Sourcegraph: Do you have positive sentiment or negative sentiment about LLMs for coding? Options were Positive, Negative, and Meh. And lo, it was about ⅔ Meh or Negative (i.e., Skeptics), which I suspect is fairly representative of the whole industry.

    I asked around, and even as of a couple weeks ago, some devs questioned whether ChatGPT could even write working code, let alone write a full program by simply by telling it to write it.

    So here I am, talking about money volcanoes, and my coworkers have formed a huge whirling meh-nado. Which natural disaster should you believe?

    Well I mean, I guess a demo is worth a thousand mehs. Let’s try it out.

    ChatGPT vs Emacs

    Let’s have ChatGPT write some Emacs-Lisp code. I’m picking emacs-lisp because it’s a corner case language, bit of a stress test for the LLM, and because it’s easy for me to try it out interactively.

    OK. I just typed this prompt into ChatGPT:

    > Write an interactive emacs-lisp function that pops to a new buffer, prints out the first paragraph of "A tale of two cities", and changes all words with 'i' in them red. Just print the code without explanation.

    ChatGPT writing an emacs-lisp function

    Here’s the code it spat out, if you can’t read the screenshot:

    (defun display-tale-of-two-cities ()
      (interactive)
      (let ((text "It was the best of times, it was the worst of times, it was the age of wisdom,
      it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity,
      it was the season of Light, it was the season of Darkness, it was the spring of hope, it was
      the winter of despair, we had everything before us, we had nothing before us, we were all
      going direct to Heaven, we were all going direct the other way – in short, the period was so
      far like the present period, that some of its noisiest authorities insisted on its being
      received, for good or for evil, in the superlative degree of comparison only."))
        (pop-to-buffer "*A Tale of Two Cities*")
        (erase-buffer)
        (insert text)
        (goto-char (point-min))
        (while (re-search-forward "\\b\\w*\\i\\w*\\b" nil t)
          (add-text-properties
           (match-beginning 0) (match-end 0)
           '(face (:foreground "red"))))))

    I copied this code directly into my Emacs session and ran it, and it did exactly what I asked it to do:

    Running the code in an Emacs session

    As you can see from the screenshot, I ran the command and it opened a buffer, printed the requested text, and then turned all the words containing ‘i’ red.

    In one shot, ChatGPT has produced completely working code from a sloppy English description! With voice input wired up, I could have written this program by asking my computer to do it.

    And not only does it work correctly, the code that it wrote is actually pretty decent emacs-lisp code. It’s not complicated, sure. But it’s good code.

    Of course people have done much, much fancier things than this. Someone wrote a product description on a napkin, took a picture, and GPT wrote a working web app that implements the product description on the napkin in the picture. The amount of power here is honestly unknown; it’s more like a cavern that we haven’t fully explored. And it just gets deeper as the LLMs get bigger.

    I mean, this stuff is unbelievably powerful. And yet I am persistently met with a mixture of disbelief and pearl-clutching. Argh, the pearl-clutching! Don’t even get me started on the pearl-clutching. Oh look, now you’ve got me started.

    Okay, you asked for it.

    Whining about Trust Issues

    <Rant mode fully engaged>

    One of the craziest damned things I hear devs say about LLM-based coding help is that they can’t “trust” the code that it writes, because it “might have bugs in it”.

    Ah me, these crazy crazy devs.

    Can you trust code you yeeted over from Stack Overflow? NO!

    Can you trust code you copied from somewhere else in your code base? NO!

    Can you trust code you just now wrote carefully by hand, yourself? NOOOO!

    All you crazy MFs are completely overlooking the fact that software engineering exists as a discipline because you cannot EVER under any circumstances TRUST CODE. That’s why we have reviewers. And linters. And debuggers. And unit tests. And integration tests. And staging environments. And runbooks. And all of goddamned Operational Excellence. And security checkers, and compliance scanners, and on, and on and on!

    So the next one of you to complain that “you can’t trust LLM code” gets a little badge that says “Welcome to engineering motherfucker”. You’ve finally learned the secret of the trade: Don’t. Trust. Anything!

    Peeps, let’s do some really simple back-of-envelope math. Trust me, it won’t be difficult math.

    • You get the LLM to draft some code for you that’s 80% complete/correct.
    • You tweak the last 20% by hand.

    How much of a productivity increase is that? Well jeepers, if you’re only doing 1/5th the work, then you are… punches buttons on calculator watchfive times as productive. 😲

    When was the last time you got a 5x productivity boost from anything that didn’t involve some sort of chemicals?

    I’m serious. I just don’t get people. How can you not appreciate the historic change happening right now?

    OK time to get concrete. I’m already on page 7, and my last attempt at this blog ran 25+ pages and languished for weeks.

    Let’s finish this.

    <Rant mode disengaged... but lurking>

    A Brief Mini-History of LLMs

    OK sooooo… this is the part that went on for 20 pages before, so let’s just make it reeeeeal simple. One paragraph.

    Here is everything you need to know about the history of LLMs, for our purposes today:

    A transformer diagram
    • The Google Brain team published a paper in 2017 called Attention is All You Need.
    • It introduced the now-famous Transformer architecture that you see to the left.
    • Everyone uses this now. It replaced ~everything in AI.
    • Google did absolutely nothing with this invention, opting for violent knee-jerking later, as per their usual M.O.
    • Meanwhile, others started training massive Transformers on obscene amounts of data. They began calling them Large Language Models (LLMs).
    • OpenAI came along with ChatGPT on November 30th 2022, the first LLM-based chatbot, missing out on an obvious opportunity to call it Large Marge. Why did they not do this.
    • Ever since then has been full batshit insanity, with new LLM-based products launching daily and technical advances happening every few hours. It’s impossible to track it all.
    • Money Volcano Alert: First lava flecks detected.

    Congrats, you’re all caught up on the history of LLMs. Go watch this amazing video for how to implement it in Python.

    A brief introduction to Coding Assistants

    OK now we can talk coding assistants. They’re just a thing that sits in your IDE and they talk to the LLM for you.

    Depending on the particular assistant, they can read and explain code, document code, write code, autocomplete it, diagnose issues, and even perform arbitrary IDE tasks through “agents” that give the LLM robotic powers, including the ability to wield and target laser guns, if someone wants to put in the work. Some assistants also understand your project environment and can answer questions about build targets, branches, your IDE, etc.

    So, already pretty cool. Right?

    But now they are beginning to be able to perform more complex tasks, such as generating a PR from the diffs on the current branch, including a detailed commit message summarizing the changes.

    Some assistants have a conversational/chat interface, too. This kind can do everything a bot like ChatGPT can do, like drafting emails, or answering random questions about the code base or the environment.

    I personally prefer a coding assistant with a chat interface. In part because I can type, but also because it makes them a platform. I can build my own workflows. Bonus points if they expose the underlying platform bits with APIs.

    I guess the simplest way to think about it would be a sort of “real-time in-IDE Stack Overflow” coupled with a really powerful new set of boilerplate automation tasks.

    OK, congrats again – you’re up to speed on what LLM-based coding assistants can do. It’s… pretty much anything. You could hook it up to outbound email and tell it to sell itself. Sky’s the limit. At this point we’re more limited by imagination than by technology.

    So! Yeah. Coding assistants. I hope by now you get how powerful they’re going to be. They may take different shapes and forms, but they’re all going to be incredibly badass before very much longer.

    Let’s dig a little into how they understand your personal code, and then we’re ready to party. 🎉

    LLMs are trained on an absolutely staggering amount of data… but that doesn’t include your code.

    There are two basic approaches to making the LLM smarter about your code. The first is to fine-tune (or train) on your code. This is not a business model that has been fully fleshed out yet, but it’s coming. And importantly it’s only part of the picture.

    The other way is to bring in a search engine. You can think of it as three related scenarios:

    • A raw LLM is like a Harvard CS grad who knows a lot about coding and took a magic mushroom about 4 hours ago, so it’s mostly worn off, but not totally.
    • Fine-tuning it on your code base is like having it study your code carefully, which means it will give better answers in general.
    • Incorporating a search engine, much like for humans, makes the AI even more effective, because it can answer direct queries very quickly. And importantly, because the search engine can be used to populate the query context.
      • Meaning, a search engine can be useful twice per query – once when figuring out how to describe and contextualize the query, and again potentially when answering the query.

    You talk to LLMs by sending them an action or query, plus some relevant context. So for instance, if you want it to write a unit test for a function, then you need to pass along that whole function, along with any other relevant code (e.g. test-fixture code) so that it gets the test right.

    That context you send over is called the context window, and I think of it as the “cheat sheet” of information that you pass along as part of your query.

    And folks, it ain’t much. It’s almost exactly like a 2-sided index card vs your whole textbook, for an exam. They give you between 4k and 32k tokens of 3-4 characters each, so at best, maybe 100k of text, to input into the LLM as context for your query. That 100k cheat sheet is how you tell the LLM about your code.

    In an ideal world, you’d just pass your entire code base in with each query. In fact, Jay Hack just tweeted a graph showing how the latest context window size in GPT-4 compares to some popular code bases:

    Diagram of GPT-4 context window versus code base sizes

    Which is kind of exciting… until you realize that it’s still just incredibly tiny compared to real-world code bases. It’s an index card vs a textbook… just a slightly bigger index card.

    That cheat sheet is all you get. That’s how you talk to an LLM. You pass it a cheat sheet.

    Which means what goes ON that cheat sheet, as you can probably imagine, is really really important.

    And with that, friends, we are finally ready for the punchline, the party, and the demo.

    You made it!

    Cheating is All You Need

    There are, by my last count, approximately 13 hillion frillion jillion LLM-backed coding assistants out there, as of mid-March. But they are all in a desperate race to the bottom, because they’re all using the exact same raw materials: An LLM, your IDE, your code base, and that pesky little context window.

    Nobody can differentiate on the LLM; they’re all about the same. And the IDE and your code base are the same. All they can try to differentiate on is their UI and workflows, which they’re all going to copy off each other. Good for you, bad for them.

    The punchline, and it’s honestly one of the hardest things to explain, so I’m going the faith-based route today, is that all the winners in the AI space will have data moats.

    A “data moat” is, in a nutshell, having access to some data that others do not have access to.

    You need a data moat to differentiate yourself in the LLM world.

    Why? Because the data moat is how you populate the context window (“cheat sheet”).

    If you can’t feed the LLM your whole code base, and you can only show it 100k characters at a time, then you’d better be really goddamn good at fetching the right data to stuff into that 100k-char window. Because that’s the only way to affect the quality of the LLM’s output!

    Put another way, you need a sidecar database. The data moat needs to be fast and queryable. This is a Search Problem!

    This is true even outside the world of engineering. There are probably 13 hillion jillion killion LLM-based outbound sales products being built like right now, as you’re reading this. But only Salesforce and a few other companies with big data moats are going to be able to differentiate in that space.

    Party Time

    OK! You’re finally done learning stuff. I’m very proud that you’ve made it to the end.

    The rest is a private Sourcegraph party. I mean, you can come along if you like, because you’re a friend. I’ll slip you by the door guy.

    You’ve just graduated from Stevey’s LLM Mini-U, and you have all the necessary theoretical background to appreciate why I feel I am the luckiest damn person on Earth, and why I’m throwing a party, right here on page two thousand eight hundred and ninety six of this blog post.

    Because folks, I honestly don’t know how I got so lucky. I joined Sourcegraph in September, not half so much for their product itself as for their Code Intelligence Platform, which was like the one I built back at Google. They’d nearly finished building v1 of this platform and it was ready to start powering something amazing.

    And then LLMs landed 10 weeks after I joined. The Singularity, the Cloverfield monster stomping around eating people, and everything else that’s happened since November 30th. Crazy town.

    And what do LLMs need again? You, in the front row. Yeah, you.

    They need the data moat! The sidecar database. For populating the cheat sheet. Remember?

    It’s a Search problem. And Sourcegraph has spent the past ten years building the solution.

    Go figure.

    Sourcegraph’s platform has four enduring, difficult-to-reproduce dimensions that are incredibly relevant to the coding-assistant space:

    • It’s universal and works across all code hosts and platforms.
    • It’s scalable and ready for enterprises of all sizes.
    • It’s precise and comparable to IDEs in its accuracy and completeness.
    • It’s open and is being developed openly and transparently.

    Sourcegraph’s engine powers gigantic enterprises with literally a hundred thousand git repositories, and/or multi-terabyte massive megarepos that make IDEs fall over and puke. And at its core is an engine so powerful that maybe teaming up with an AI was its destiny all along.

    Whoo boy. Did I get lucky? I think I got pretty lucky. We have such an incredible head-start in this race.

    When I say “We’re building a coding assistant”, I want you to think back to when Ruben Ortega showed us Amazonians his little demo of a remote procedure call over HTTP. That was Baby AWS.

    Now take a look at what my esteemed colleague and Sourcegraph teammate Dominic Cooney slacked me last week:

    In other news, I am getting even more enthusiastic. There is more, much more, to this space than LLMs and I think we have the embryonic stage of some amazing invention here. Like some realizations that context provisioning is a service. That the output has streams, like one into the editor and one into the chat. That the LLM benefits from the opportunity to self criticize. That the UX needs diffs and in-situ things to home in on things. That search is as big a deal as chat. Many of our thumbs down reactions to [LLMs] come from them complaining they didn't have the context of what language, file, codebase, etc. the user is talking about. Which bodes well because Sourcegraph should be able to do that really well.

    He’s glimpsed the future, and it’s vast. His comment, “I think we have the embryonic stage of some amazing invention here”, reminded me of all the embryonic stages I’ve seen of other eventually-amazing things: The Mosaic web browser. The mini-Borg demo that became Kubernetes. The Amazon web-services demo. The hanging-GET request in the browser.

    Little things grow up, folks!

    I’ve seen this movie before. I know how it ends. This volcano is the big one. Skeptics beware. At the very least, I hope by now you’re taking LLM-backed coding assistants in general just a teeny bit more seriously than you were an hour ago.

    OK, you’ve heard the punchline, and the party’s in full swing. Lemme show you Cody and we’ll call it a day.

    A whirlwind tour of Sourcegraph’s Cody

    Say hi to Cody:

    Cody logo

    Cody is Sourcegraph’s new LLM-backed coding assistant. Cody knows about your code. It has templated actions, such as writing unit tests, generating doc comments, summarizing code, that kind of thing. You know. Stuff you can choose from a menu. Like other assistants. It even has code completions, if you’re into that sort of thing.

    Cody is not some vague “representation of a vision for the future of AI”. You can try it right now.

    And it has a chat interface! Which means it’s totally open-ended; you can ask it any question at all about your code base or your environment, and we’ll send it the right cheat sheet. And Cody itself is a platform, because you can use it to build your own LLM-backed workflows.

    My favorite kind. Naturally.

    Currently Cody is a VSCode plugin, though we’ll have it in other places soon enough.

    Using Cody in VS Code

    Cody scales up to the very biggest code bases in the world. And even though Cody is still a baby, just like Baby AWS back on Ruben’s stinkpad in 2003, it’s already able to lift a huge space rhino using only the power of the Force. Hang on, sorry wrong baby.

    Ahem. As I was saying, like Baby AWS, Cody is also a baby with special powers, and honestly… we don’t know how powerful it’s going to get. It seems to be accelerating, if anything.

    Oh, and anyone can sign up to get access start using Cody now. (Edited after Cody was released.)

    OK so anyway here’s how Cody works:

    A diagram of how Cody works

    Here’s the diagram above, in a nutshell:

    • You ask Cody to do something (for instance, “write a unit test for this function”)
    • Cody populates the cheat sheet / context window query using Sourcegraph’s code intelligence platform (search queries, embeddings retrievals, graphql queries, etc)
    • It sends the context+query to the LLM, and parses the results
    • It optionally inserts the results back into the IDE (depending on the action)

    And of course this is all just the veeeery beginning. This thing will grow into a goliath that enhances everything you do as an engineer.

    Other coding assistants, which do not have Sourcegraph for Step 2 (populating the context), are stuck using whatever context they can get from the IDE. But sadly for them, IDEs weren’t really designed with this use case in mind, and they make it difficult. And more damningly, no IDE scales up to industrial-sized code bases.

    So it’s just plain unfair. I’m just gonna call it like it is. Sourcegraph has an absolutely unfair advantage, which they built up over ten years of building out this incredibly scalable, precise, and comprehensive code intelligence platform, powered by a world-class search engine and code knowledge graph.

    Did I mention I’m lucky? I’m so fortunate to be here, and grateful to be part of this team.

    Afterlogue

    I don’t really know how to finish this post. Are we there yet? I’ve tried to write this thing 3 times now, and it looks like I may have finally made it. I aimed for 5 pages, deliberately under-explained everything, and it’s… fifteen. Sigh.

    But hopefully you got the main takeaways. Baby AWS. Knee jerking. Meh-nadoes. Cheat sheets. Data moats. Cody. You got this!

    LLMs aren’t some dumb fad, like crypto. Yes, crypto was a dumb fad. This is not that.

    Coding assistants are coming. They’re imminent. You will use them this year. They will absolutely blow your mind. And they’ll continue to improve at an astonishing rate.

    They will feel gloriously like cheating, just like when IDEs came out, back in the days of yore. And for a time-constrained developer like me–and I say this as someone who has written over a million lines of production code…

    Cheating is all you need.

    Thanks for reading, and if you like, come be our Head of AI! Or come do something else with us! It’s pretty merry around here already, but the more, the merrier.


  2. Bob Metcalfe wins Turing Award | Hacker News

    Hacker News new | threads | past | comments | ask | show | jobs | submit hanlec (199) | logout Bob Metcalfe wins Turing Award (acm.org) 831 points by robbiet480 1 day ago | flag | hide | past | favorite | 229 comments ... (81449 chars. See body)


    Bob has been an active member of the Austin startup community for 10+ years and I've talked with him many times. As a EE, it was cool meeting him the first time and once I'd chatted with him a few times, I finally asked the question I'd been dying to ask: How'd you come up with "Metcalfe's Law"?

    Metcalfe's Law states the value of a network is proportional to the square of the number of devices of the system.

    When I finally asked him, he looked at me and said "I made it up."

    Me: .. what?

    Him: I was selling network cards and I wanted people to buy more.

    Me: .. what?

    Him: If I could convince someone to buy 4 instead of 2, that was great. So I told them buying more made each of them more valuable.

    It was mind blowing because so many other things were built on that "law" that began as a sales pitch. Lots of people have proven out "more nodes are more valuable" but that's where it started.

    He also tells a story about declining a job with Steve Jobs to start 3Com and Steve later coming to his wedding. He also shared a scan of his original pitch deck for 3Com which was a set of transparencies because Powerpoint hadn't been invented yet. I think I kept a copy of it..

    Btw, when I say "an active member of the Austin startup community" - I mean that seriously.

    Not only did he teach a class on startups at the University of Texas but regularly came to a coffee meetup for years, attended Startup Weekend demo time, came to Techstars Demo Day, and was generally present. I even got to do the Twilio 5 Minute Demo for one of his classes (circa 2012).

    It was always cool to have someone who shaped our industry just hanging out and chatting with people.


    Absolutely correct. Chatted with him several times circa 2015 to 2016 when working out of Capital Factory in Austin. He was present for all sorts of event such as mentor hours, startup pitches, etc. Funnily enough, he would give you a very stern look if he thought you were taking him for a ride. Have not been there recently as as much as I would like, but I imagine he is still around to be found.


    Had a very similar experience hanging out with him and his equally-brilliant wife Robyn in ATX between 2011-2012. Very approachable guy -- impressively so, given his stature in the industry -- but could be quick with the "what the hell are you talking about?" look.


    I respect Metcalfe a lot, but halfway through undergraduate discrete math it was pretty obvious to most people in the class even before seeing a formal proof that a fully connected graph has O(n^2) edges. I just figured that people wowed by "Metcalfe's Law" were business types who didn't any formal theory into computing.


    Yeah, but basically it’s a statement that value scales linearly with the number of pairwise connection.

    but it's a loose approximation so it's not good to overanalyze it.

    The number of pairwise connections grows as the number of pairwise connections, and connections ("how many people can you talk to") are valuable, so value grows. But individual connections to networks grow the pairwise connections by N, so that's even better.

    broadcast (one to many connections, like giving a speech to a crowd) is an efficiency hack, which is good, and efficiency hacks grow as the number of connections grow, so that's good too...

    ... is more how I think about what Metcalfe was talking about. Which aspects are x, which are x squared, which are log x is interesting, but that's not all bound up in his simple statment, despite his "as the square" wording.

    and Bob Metcalfe is personally a great guy in all the ways people are saying, but it's not soooo unique, that's the way a lot of tech types were as the mantle passed from the Greatest Generation to the Boomers (and what was that one in the middle, "lost" or "invisible" or something) I'm not suggesting we've lost that (we may have) just saying that's how it was, for instance as an undergrad you could walk into any professor's office and get serious attention.

    Only two?

    The gravity law argument based on geographic distribution of traffic, Zipf's Law and Bradford's law all have empirical evidence behind them. That's three. Additionally another version of the same paper Bob Briscoe contributed data from British Telecom usage that supported the same scaling rule.

    The second paper that you gave is interesting. Odlyzko was the one who contributed that particular argument. It is right that there are rational reasons to not interconnect. But Metcalfe would imply more of a first mover advantage than we actually see. In social networks we had Friendster, MySpace and Facebook, each of which overtook the other. How could a new entrant supplant the king? Not once, but twice?

    Since then new social networks have continued to sprout and succeed. Facebook managed to stay on top, in part through purchasing other networks. One of which (Instagram) is on track to surpass Facebook in revenue.

    Now let's look at the 4 papers that you collected.

    The first and third have the same flaw. They are looking at revenue over time as the network grew. But the growth of the network is not the only change that happened over time.

    1. The Facebook product improved to become more compelling, even for the same users. In part by adding new channels through purchasing other networks.

    2. Facebook kept adding new ways to monetize people, improving revenue.

    3. People's behavior has shifted to more online over time. Thus it was easier to get value from the same users in 2014 than in 2008.

    Because so much has changed, comparing users in 2008 to users in 2014 is not apples to apples.

    Next, let's turn to the last paper. I'm in agreement with patio11 that Bitcoin's valuation has been driven by the largest Ponzi scheme in history. Therefore I view most of its valuation as fake. And so am not inclined to accept arguments from that valuation as valid.

    And I saved the best for last. In section 2.4 the EU paper argues that Briscoe's law (I think Odlyzko should be credited, but Bob Briscoe is in the EU) is more accurate than Metcalfe's law after you hit scale.

    Their argument in effect is a variant of one that was discussed privately before we wrote our paper. Our immediate perception of the size of a network is based on how much of our personal social groups are on it. The value we get from that network is based on the same. Therefore our perception of the size of the network is correlated with the value we get from it. If the network mostly contains parts of groups, you do get something like Metcalfe's Law out of this. But once the network contains a lot of completed social groups, members of those groups slow down how much value they gain as the network continues to grow.

    In other words when the connections in the network are a random sampling of the connections that matter to us, growing the network adds valuable connections. Once the network contains the connections that we know matters to us, most of us only benefit marginally from continued growth.


    Was it not specifically "compatibly communicating devices" or something and not users like how it was marketed.


    well, according to Alonzo Church, if this is x squared and that is x squared, then this is that.


    (n²-n)/2 is O(n²) as GP claimed (not that it's exactly n², unless edited before I saw it), you're both right.

    HN comment of the year winner right here! Makes you wonder how many other laws are built on nothing.

    If there's one thing I leaned doing a Ph.D. is if you dig deep enough, you find many foundational laws of nature rely on some necessary assumption that, if proven incorrect, would topple the whole thing


    It's worth mentioning Moore's Law, which was actually a short term prediction, arguably turned into a business goal. The "law" states that the number of transistors in integrated circuits (like CPUs) will be doubled every two years (or 18 months by some variations). It wasn't entirely made up, as it was mostly based on advances in manufacturing technology, but it was a prediction made in 1965 that was supposed to hold for ten years. However reality kept up with this prediction for far longer than anticipated until the physical limits of silicon miniaturization became apparent in recent (ish) years, until the mid 00's (maybe later?).

    I think it technically kept going into the early 2010s due to additional advancements and technically it hasn’t yet stopped but people are generally skeptical that TSMC and Samsung can keep this party going (a party that seems to have stopped for Intel itself apparently).

    Dennard scaling though did end in the mid 00s and this impacted Koomey’s law which talks about performance/watt and saw a similar petering out.

    Apparently the bound at even a conservative rate puts the thermodynamic doubling limit at 2080. After that we’ll have to really hope smart people have figured out how to make reversible computing a thing.

    CPU clock speed stopped improving slightly sooner than that. Performance continued to improve, but they switched from making single threaded code faster to adding more cores.

    This was a bit of a bummer for programmers working in single threaded languages who found that their code stopped getting faster for free.


    lots of (not only european) public funding made progress to euv of asml, zeiss and other possible.


    Sorta. It didn't hit a hard boundary per se, but SRAM has practically stopped scaling and even random logic is only scaling at about 1.5x every three years or so. Like a lot of cases it's an s-curve, and we're on the other side of the bend at this point.


    Moore's law is almost the opposite of Metcalfe's - Metcalfe's encourages you to build out the network as fast as possible to get the most value; Moore's implies you should wait as long as possible before buying processing power to get the most you can.


    Moore’s law isn’t even dead. It says that the number of transistors per dollar rises at that rate, which is still going. Commenters tend to omit the cost component of Moore’s remark.


    CS "laws" like Metcalfe's are closer to Murphy's Law than Newton's...

    > Makes you wonder how many other laws are built on nothing.

    variance/standard deviation (also btw, a sum of squares concept)

    it marks the inflection points on the gaussian curves, but so what, the 2nd derivative points to something significant about the integral? not really. But even if we accept that it does, what does two standard deviations mean? a linear double on the x coordinates says what about the a hairy population density function? nothing.

    or similar to Metcalfe's Law, the very widely used Herfindahl Index (also squares!). It's a cross between a hash and a compression, it says something about the original numbers, but wildly different scenarios can collide.


    When observation is translated to "law". That is an act of judgment on the part of the law-maker, purely. Call it "built on nothing" if you like. But as opposed to what?

    "IF proven incorrect" is the important part.

    This "law" isn't somehow less true just because it was originally used as a sales tactic.

    The law implies testable consequences, such as what the economic incentives should be from interconnecting networks. They are good enough that we should expect to see more drive to interconnect, and stronger barriers to entry for future networks, than history actually shows.

    https://www-users.cse.umn.edu/~odlyzko/doc/metcalfe.pdf offers this and several other lines of evidence that the law is wrong, and O(n log(n)) is a more accurate scaling law.


    It is quite telling that when Bob Metcalfe 'makes stuff up' he still hits it out of the park.


    A little confirmation bias on this one. In addition to the infamous internet will collapse prediction he was also pretty whole hog on the Segway scooter revolutionizing transit.


    So let me enlighten you a bit: we did collapse the internet, and got a testy email from a bunch of backbone maintainers that they were going to block our live video streams (on port 2047) in four weeks time or so. Which resulted in us moving to the other side of the Atlantic to relieve the transatlantic cable. So even if it didn't make the news Metcalfe was 100% on the money on that particular prediction. The Segway never had a chance as far as I'm concerned but the other thing he got just so. But maybe he never knew (and I never knew about his bet).


    He made a very specific prediction - it didn’t pan out - that there have been near misses and even global degradation events multiple times in the past 3 decades is not relevant. He admitted he was wrong and literally ate his words.

    > But I predict the Internet, which only just recently got this section here in InfoWorld, will soon go spectacularly supernova and in 1996 catastrophically collapse.

    - Bob Metcalfe

    From the paper:

    > In general, connections are not used with the same intensity... so assigning equal value to them is not justified. This is the basic objection to Metcalfe’s Law...

    In my architectonic opinion, the perfect network comprises all nodes operating equally. Ergo the ideal is indeed Metcalfe's law, but architecture and design can be costly, which is simple the inefficient use of resources. These being very precise machines, anything less than 99.999% is amateur, ergo the law obtains.


    We are talking about computer systems that connect a network of humans. Humans are notoriously imprecise and unreliable machines. Anything more than 0.00001% is therefore a miracle.


    Lol, networking people has produced little of real value except the paradigm itself, and social networking is little more than making humans more efficient at marketing to each other. Networking is for DATA. When people behave like networked machines... well that's global capital communism tbqh.


    I remember trying to get NICs to work in Linux and the best advice was usually “just try the 3c509 driver”.


    I remember when I bought my first fast ethernet card, there was some Linux HOWTO that discussed various ethernet NIC's, and crucially, their Linux drivers in excruciating detail. And the takeaway was that if you had a choice, pick either 3com 5xx(?) or Intel card. The 3com card was slightly cheaper at the local computer shop, so that's what I ended up with (595 Vortex, maybe?).


    Yeah, I had gold-plated 100Mb 3Com cards and they were the best. (something-905-series?) With full-duplex, hardware offloading, good drivers. I still have one lying somewhere. )

    As a poor college student I scavenged 3c509 cards to build a computer network in an apartment I shared with two other chronic internet users.

    That was right about the time someone has solved a bug with certain revisions of the card behaving differently. So suddenly the availability jumped considerably.


    It was well known when I started that you got a card that would work with that (and later for gigabit it was e1000).


    Although he made it up, there's an argument that the value goes up more than linearly. But as the network grows, every node doesn't necessarily need to talk to every other node except in rare circumstances, or they can reach each other through an intermediate point. So maybe O(n log n) would be closer.


    I recall seeing an article a number of years ago that argued just that. That the network effect is nlogn. Still enough to help explain why large networks grow larger, but it also means that overcoming the incumbent is not the insurmountable wall it may seem to be. You may only need to work twice as hard to catch up, rather than orders of magnitude harder.

    He may have "made it up" to improve sales, but from a certain viewpoint it's correct. If decide to measure the "value" of a network based on the number of node connections, then the number of connections for n nodes is n(n-1)/2 = 0.5n^2 - 0.5n which is O(n^2).

    Of course, the value of something is hard to measure. Typically you measure value as "benefits - costs", and try to convert everything to a currency. E.g., see: https://www.investopedia.com/terms/c/cost-benefitanalysis.as... . But there are often many unknowns, as well as intangible benefits and costs. That make that process - which seems rigorous at first - a lot harder to do in reality.

    So while he may have "made it up" on the spot, he had a deep understanding of networking, and I'm sure he knew that the number of connections is proportional to the square of the number of nodes. So I suspect his intuition grabbed a quick way to estimate value, using what he knew about connection growth. Sure, it's nowhere near as rigorous as "benefits - costs", but that is hard to really measure, and many decisions simply need adequate enough information to make a reasonable decision. In which case, he both "made it up" and made a claim that you can justify mathematically.


    And yet it's trivially true. Value accrues with connectivity, which is number of the edges in a fully connected graph being n(n-1)/2, which as n grows larger approximates to n^2. I would be surprised he said he "made it up", other than as a joke about elementary computer science.


    As n grows larger, the number of edges approximates n²/2. I may be pedantic but I feel that the difference between something and it's half is non-negligible.


    You're assuming complete connectivity; no one builds networks of nontrivial size that way.

    Well deserved. I remember dealing with a whole raft of other networking technologies and Ethernet stood head-and-shoulders above anything else available at the time.

    One thing that is not well appreciated today is how power efficient Ethernet was, even on launch in the coax era. Other network technologies (Token Ring as embodied by IBMs network cards, for instance) consumed power like there was no tomorrow. Leading to someone quipping renaming it to 'smoking thing'.

    As the price came down (around the NE1000/2000 and 3C509 era) it suddenly was everywhere and economies of scale wiped out the competition until WiFi came along. But even today - and as I'm writing this on my ethernet connected laptop - I prefer wired networks to wireless ones. They seem more reliable to me and throughput is constant rather than spotty, which weighs heavier to me than convenience.

    So thank you Bob Metcalfe, I actually think this award is a bit late.

    Anybody remember Don Becker?

    I had no idea that Token Ring was inefficient with power, but it certainly had a bunch of other problems. Biggest (at least on PCs) was its inability to recover from a cable being unplugged without resetting a bunch of the system, and the type-1 token ring cables win the award for being the most needlessly bulky,[1] even if the connectors had a plug-into-each-other party trick.

    [1] https://en.wikipedia.org/wiki/Gender_of_connectors_and_faste...

    I still have a soft spot in my heart for ARCNet. In the 80s it was cheaper than ethernet, but more reliable than token ring. And for the few places that prioritized determinism over throughput, it was indispensable.

    But ethernet kept improving speed and reliability while ARCnet retreated to shop-floor niche applications.

    Alas.


    ARCNet was nice, except for when people decided randomly to remove the terminator off the t-connector on the back of their desktop because "it looked weird" and thus taking down the whole network. That happened to me more than once doing network support in college.


    ARCNet is mentioned heavily in The Big Bucks. I have to admit that I knew very little about it before doing research for the book.


    One more book on the stack... Now I have to read it to find out how ARCNet worms it's way into a novel about sili valley.

    This part's not in the book: Gordon Peterson, the architect of ARCNet, was a major source for me. He talked to Bob back in the day.

    Gordon's still bitter about it, and will gladly tell you why Ethernet is inferior.

    Ethernet is one of those case studies in "worse is better".

    I remember the old saying that "Ethernet doesn't work in theory, but it does in practice". Mostly referring to the CSMA/CD scheme used before switches took over.

    The competitive advantage of being built out of cheap commodity hardware and cabling is hard to overstate. Nobody likes dealing with vendors, their salespeople, and especially support contracts. Especially since that is always more expensive and often solves problems you don't have, like minimum latency guarantees, at the cost of throughput and complexity.

    There were a lot of LAN schemes back then. Mostly forgotten now.

    Many press commentators opined that of course "broadband" would be much better than "baseband" since it could carry voice and video, not just bits.


    Agreed. Can't overstate the cost effectiveness. In the late 80s or early 90s you could put hundreds of dumb terminals on one network with just hubs for signal integrity. Plenty of collisions but it all worked itself out somehow if the throughput was light, such as text applications, text email, and a small amount of printing or sharing. This meant every university could have some kind of network scheme, making it a universal for the next gen.

    I even put a whole section with that old debate in Inventing the Future (Janet, working at Xerox, is arguing with her husband Ken, who's at Hughes Aircraft):

    ==============

    She’d heard that the product Xerox was going to ship was aiming for 20 megabits.1 It was amazing. But, as people at Xerox explained, imagine an entire office full of knowledge workers with their own computers, sending documents and email to each other and to the printer. How much bandwidth would they need? A lot! Someone in Palo Alto had done a whiteboard exercise on the bandwidth required for teleportation, in other words, sending an entire human being through the wire. It wasn’t that they took the prospect of teleportation seriously, but still, it was fun, and she enjoyed being around people who had fun at work.

    Ken scoffed and insisted that Ethernet would never work. This was a recurring argument with them that was starting to annoy her, although usually she’d just smile and change the subject. He kept hurling the word deterministic as if it were a magic talisman. On the Ethernet, you weren’t certain how long you’d have to wait to transmit your data. If the wire was busy when you wanted to talk, you had to wait.

    If someone else tried to transmit exactly when you did, you both had to back off and try again. An engineer could give you the probabilities of various results, but no more. This was not real engineering, and Ken was offended by it. He was even offended by the word “ether” since any beginning physics student knows that ether was a bogus concept that was disproven ages ago.

    The entire communications field that the two of them had studied at MIT was based on strict mathematical calculations and guarantees. For example, when you make a telephone connection, that circuit is yours until you’re done. No other calls interfere with yours. The telephone companies had spent decades perfecting this system, and they had a monopoly. How could any company, even one as big as Xerox, hope to change that?

    She tried telling him that packet-switching was a real discipline, and the Defense Department itself was backing it. It was originally designed to survive a nuclear attack that destroyed some of the military’s communications lines. With packet switching, the message was broken up into pieces, or packets, and the packets might arrive on separate paths. Ken didn’t believe it would ever have commercial applications. People in aerospace had a low opinion of commercial stuff anyway. 1 The original speed goal for the Xerox Wire was 20 megabits/sec. The first controller, for the Dolphin, had independent send and receive buffers, but could only be made to fit on the board using 10-Mbps CRC chips from Fairchild. Furthermore, in the lab, Tony found that 20-Mbps signaling caused spurious collision detects on the cable due to transceiver tap reflections.


    There were a lot of LAN schemes - and slightly incompatible ethernet implementations. I remember when the Interop tradeshow in Vegas required vendors to either attach and integrate with the show network or they would get kicked off the floor. Good times!

    Well.. I still want to read the book. I'm a sucker for a well crafted story about old hardware from the days when technology gods walked the earth.

    I'm sure Ethernet's market domination is because the spec wasn't owned by a single company, and nothing to do with it's technical merits. After IBM's SNA, people seemed paranoid of a networking spec being owned by a single company. Do you know if Datapoint thought about that and whether they tried to build their own equivalent of the DIX consortium?

    I also think about SpaceWire / IEEE-1355 / Wormhole Routing and what might have been had we adopted systems where compute power could be easily upgraded.

    Oh! The good old days when everything was possible!

    on DataPoint: my hero (sort of) Matt Feingold spends a summer internship at DataPoint. As far as he (and I) could tell, people still thought in terms of "account control" back then.

    There's actually a book on DataPoint (and almost every other company from way back when). I read them so you don't have to :)


    I get the impression that 10BASE-T killed ARCNet, and it was the "T" rather than the "10" that did so. Running cheap CAT-5 to a set of interconnected hubs was just so much easier and more reliable than t-connectors, terminators &c.


    Never met Don Becker but as it was the Beowulf project that got me interested in GNU/Linux he is synonymous with ethernet drivers

    >They seem more reliable to me and throughput is constant rather than spotty, which weighs heavier to me than convenience.

    They ARE more reliable.

    I much rather use ethernet than wifi on desktops and laptop.

    Now with video meetings, high quality webcams, mics and gaming, latency and bandwith is king.

    WiFi is usually FAST but it is not as STABLE.


    Sure, years ago. But today Ethernet is just as scammy as everyone else, we've been stuck at 1 Gbps on consumer grade hardware for more than 15 years. There are claims (unverified ofc) about their executives boasting about their stupid margins. 1 Gb switch is like 10-20 euros meanwhile 2.5 Gbps is like over 100...

    2.5Gb is downshifted 10Gb with the same line coding, just with 1/4 the symbol rate. This means that it inherits all the complexities of 10GbE, while tolerating cheaper connectors and cables. 10GbE uses DSQ128 PAM-16 at 800Msym/s. 2.5G just does quarter-rate at 200Msym/s.

    1000BaseT uses trellis coded PAM-5, a significantly less complex modulation.

    When one factors in the complexity of the line code and all equalisation and other processing in the analog frontend things get expensive. Copper 10Gb interfaces run very hot for a reason. It takes quite a bit of signal processing and tricks to push 10Gb over copper, at least for any significant distances.

    It's not really about can handle, but more is specified to handle at maximum length in a dense conduit.

    At shorter lengths, and in single runs, it's always worth trying something beyond what the wiring jacket says. I've run gigE over a run with a small section of cat3 coupled to a longer cat5e run (repurposed 4-pair phone wire), and just recently setup a 10G segment on a medium length of cat5e. The only thing is while I think 2.5G/5G devices do test for wiring quality, the decimal speeds don't, auto-negotiation happens on the 1Mbps link pulses, unmanaged devices can easily negotiate to speeds that won't work, if your wiring is less than spec, you need to be able to influence negotiation on at least one side, in case it doesn't work out.


    I can't make heads or tails of your comment. What is scammy about Ethernet and what 'stupid margins' does Ethernet have? It's a networking standard, not a company.


    2.5G or even 10G is not that much more expensive and companies making consumer electronics sell it as a considerable premium for what is essentially the same cost difference as making a 8gb vs 16 gb flash drive. Of course, regular internet users don't need more than 2.5G (and couldn't use it in most of the world due to ISP monopolies) so anything faster than gigabit is a target for segmentation.

    The market at work. There is just no real demand for anything beyond 1G.

    The HN crowd is not representative of what would be needed to drive the price tags down on 2.5G stuff.

    If you have a gigabit internet connection, then most of the value of 10G comes from data sharing within the intranet, which just never caught on outside of hobbyists. And a 1G switch can still handle a lot of that, You don’t even need 10G for LAN parties, and whether backups can go faster depends on the storage speed and whether you actually care. Background backups hide a lot of sins.

    I’m hoping a swing back to on-prem servers will justify higher throughput, but that still may not be the case. You need something big to get people to upgrade aging infrastructure. What would be enough to get people to pay for new cable runs? 20Gb? 40?

    Rant aside, I think there is an argument to be made that 2.5gbps switches "should" be cheaper now that 2.5gbps NICs have become fairly commonplace in the mainstream market.

    Case in point, I have a few recent-purchase machines with 2.5gbps networking but no 2.5gbps switch to connect them to because I personally can't justify their cost yet.

    I suppose I could bond two 1gbps ports together, or something, but I like to think I have other yaks to shave right now.

    You can get some basic switches that do 2.5gb but it's like $100, a bit more for a brand you might recognize.

    https://www.amazon.com/5-Port-Multi-Gigabit-Unmanaged-Entert...

    Personally I went with Mikrotik's 10gb switch but that needed SPF port thingies (which was fine for me, as I was connecting one old enterprise switch via fiber, direct copperering two servers, and using cabled cat7 or whatever for the Mac).

    2.5gb is silly in my opinion unless it's literally "free" - you're often better with old 10gb equipment.

    > 2.5gb is silly in my opinion unless it's literally "free" - you're often better with old 10gb equipment.

    I think 2.5g is going to make it in the marketplace, because 2.5g switches are finally starting to come down in price, and 10g switches are roughly twice the price, and that might be for sfp+, so you'll likely need transceivers, unless you're close enough for DAC. (NIC prices are also pretty good now, as siblings noted. But if you go with used 10G, you can get good prices there too, I've got 4 dual 10G cards and paid between $25 and $35 shipped for each)

    Yeah, it's that cost that is the problem. If I'm paying over a hundred bucks for a switch I might as well go higher and consider 10gbps options.

    2.5gbps hardware need to come down to at least the $30 to $40 dollar range if they want to make any sense. Otherwise, they'll stay as niche hardware specifically for diehard enthusiasts or specific professionals only.


    The NICs can be had for $20 (pretty sure I saw a $11 one the other day but can't find it right now on mobile).


    The NICs are reasonable now, yes. The issue is the thing on the other side of the cable; 2.5gbps switches and routers need to come down in price.

    The problem with 2.5G is that it's not enough of an upgrade over 1G to warrant buying all new switches and NICs to get it. For that matter few home users push around enough data for 10G to be a big win.

    IMHO this is why Ethernet has stalled out at 1G. People still don't have large enough data needs to make it worthwhile. See also: the average storage capacity of new personal computers. It has been stuck around 1TB for ages. Hell, it went down for several years during the SSD transition.

    2.5gbps is literally 2.5x times the speed of gigabit ethernet, so that's going to be very noticable even for most home users if they do any amount of LAN file sharing.

    It's really just the cost that's the problem, because paying 4x to 5x or even 6x times the cost of gigabit hardware for a 2.5x times performance boost doesn't make a lot of sense.

    If 2.5gbps peripheral hardware costs would come down I will happily bet they will take off.


    This assumes that the LAN is the bottleneck. Gigabit ethernet tops out at 120MB/s, which is about the speed of spinning rust on a NAS.

    Yeah, but you probably have more than one drive RAID'd in that NAS so you will almost certainly get faster transfers (granted: sequential) if ethernet wasn't the bottleneck.

    2.5gbps ethernet translates to roughly 250MB/s in real world transfer speeds, that's a lot. Literally over double real world gigabit transfer speeds, and far less likely to bottleneck you.


    But that has nothing to do with Ethernet as such, which isn't a 'company making consumer electronics'.


    You are may actually be right, sorry, my rant may have been misguided. "networking standard" doesnt make it free of royalties though, dont/didn't companies pay to use the Wifi protocol?


    It's a big loss that wired networking speeds have plateaued but I feel it's more about apps and people adapting to slow and choppy wireless networks that penalise apps leveraging quality connectivity, and stand as bottlenecks in home networks (eg you don't need 10G broadband the wifi will cap everything to slow speeds anyway). And mobile devices that had much smaller screens and memories than computers for a decade+ stalling the demand driven by moore's law.

    People buy ethernet for reliable connection and reliable latency (no package drops), and to get 1Gbps. Few consumers have need for more, since internet speeds also rarely exceed 1Gbps.

    Sure, anyone with a NAS might like more, but that's a tiny market. And tiny markets lack economy of scale, causing prices to be high.

    You can have 10G with eg, Mikrotik at a reasonable price.

    One problem with it is that the copper tech is just power hungry. It may actually make sense to go with fiber, especially if you might want even more later (100G actually can be had at non-insane prices!)

    Another problem is that it's CPU intensive. It's actually not that hard to run into situations where quite modern hardware can't actually handle the load of dealing with 10G at full speed especially if you want routing, a firewall, or bridging.

    It turns out Linux bridge interfaces disable a good amount of the acceleration the hardware can provide and can enormously degrade performance, which makes virtualization with good performance a lot trickier.

    You can go fast if you don't do anything fancy with the interface.

    If you say, want bridged networking for your VMs and add your 10G interface to virbr0, poof, a good chunk of your acceleration vanishes right there.

    Routing and firewalling also cost you a lot.

    There are ways to deal with this with eg, virtual functions, but the point is that even on modern hardware, 10G can be no longer a foolproof thing to have working at full capacity. You may need to actually do a fair amount of tweaking to have things perform well.


    The other issue is that unless your computer is acting as a router or a bridge, you need to do something with that 10GB data stream. SSDs have only recently gotten fast enough to just barely support reading or writing that fast. But even if you do find one that supports writes that fast a 10GbeE card could fill an expensive 4TB drive in less than an hour. Good luck decoding JPEGs and blitting them out to a web browser window that fast.

    >10GB data stream. SSDs have only recently gotten fast enough to just barely support reading or writing that fast.

    10gbps (gigabits per second) is not 10GB/s (gigabytes per second).

    Specifically, 10gbps is approximately 1.25GB/s or 1250MB/s.


    Consumer SSDs used to max out at about 550MB/s, some still do. You need a larger and more modern drive to do 1.25GB/s sustained write. Even then buffering can get you.

    That's due to the communication protocol.

    2.5 inch and M.2 SATA SSDs max out around 550MB/s due to the limits of SATA3 connections which cap out at 6gbps.

    M.2 NVME SSDs meanwhile communicate over PCIE, generally using four lanes, and the latest PCIE5 SSDs can do around 15GB/s if I recall. PCIE4 drives can get up to around 7GB/s, and PCIE3 drives up to around 3GB/s.

    Other potential bottlenecks can occur with the motherboard chipset, controller, and NAND flash, but details.

    TL;DR: Any NVME SSD can saturate a 10gbps ethernet connection.

    TCP/IP offload isn’t the issue.

    The core problem is that the Linux kernel uses interrupts for handling packets. This limits Linux networking performance in terms of packets per second. The limit is about a million packets per second per core.

    For reference 10GE is about 16 million packets per second at line rate using small packets.

    This is why you have to use kernel bypass software in user space to get linerate performance above 10G in Linux.

    Popular software for this use case utilize DPDK, XDP or VPP.

    You don't need an interrupt per packet, at least not with sensible NICs and OSes. Something like 10k interrupts per second is good enough, pick up a bunch of packets on each interrupt; you do lose out slightly on latency, but gain a lot of throughput. Look up 'interrupt moderation', it's not new, and most cards should support it.

    Professionlly, I ran dual xeon 2690v1 or v2 to 9Gbps for https download on FreeBSD; http hit 10G (only had one 10G to the internet on those machines), but crypto took too much CPU. Dual Xeon 2690v4 ran to 20Gbps, no problem (2x 14 core broadwell, much better AES acceleration, faster ram, more cores, etc, had dual 10G to the internet).

    Personally, I've just setup 10G between my two home servers, and can only manage about 5-8Gbps with iperf3, but that's with a pentium g2020 on one end (dual core Ivy Bridge, 10 years old at this point), and the network cards are configured for bridging, which means no tcp offloading.

    Edit: also, check out what Netflix has been doing with 800Gbps, although sendfile and TLS in the kernel cuts out a lot of userspace, kind of equal but opposite of cutting out kernelspace, http://nabstreamingsummit.com/wp-content/uploads/2022/05/202...

    Interrupt moderation only gives a modest improvement, as can be seen from the benchmarking done by Intel.

    Intel would also not have gone through the effort to develop DPDK if all you had to do to achieve linerate performance would be to enable interrupt moderation.

    Furthermore, quoting Gbps numbers is beside the point when the limiting factor is packets per second. It is trivial to improve Gbps numbers simply by using larger packets.

    I'm quoting bulk transfer, with 1500 MTU. I could run jumbo packets for my internal network test and probably get better numbers, but jumbo packets are hard. When I was quoting https download on public internet, that pretty much means MTU 1500 as well, but was definitely the case.

    If you're sending smaller packets, sure, that's harder. I guess that's a big deal if you're a DNS server, or voip (audio only); but if you're doing any sort of bulk transfer, you're getting large enough packets.

    > Intel would also not have gone through the effort to develop DPDK if all you had to do to achieve linerate performance would be to enable interrupt moderation.

    DPDK has uses, sure. But you don't need it for 10G on decent hardware, which includes 7 year old server chips, if you're just doing bulk transfer.

    Bulk transfers aren’t that being interesting from a networking perspective.

    You gonna have a bad time if you optimize only for the best case scenario.

    Even using IMIX is a low bar. The proper way to do things is linerate using small packets.


    Most Linux network drives support NAPI since a couple of decades. No panacea of course, but still, far from having one interrupt per packet.

    I think around 2011, they offered the first UT Longhorns Startup course, it was cool and hip and new, and they'd flown in mentors from SV and other places, so I figured, why not?

    So, after applying, I had shown up at a hotel near campus. While waiting in the lobby, playing with their unsecured wifi, a rather distinguished looking gentleman came up to me, and asked, Hey are you here for the Startup Course interviews?

    Yeah...

    Well, why are you here in the lobby?

    Well, I was told to wait here, and its been a half hour nobody called me.

    He gave me a look, direct in the eyes, and said, oh, really? And you're just going to sit here and wait?

    I was dumbfounded. Of course, it made sense, but it felt.. I didn't want to piss off the organizers, right?'

    "Go in there, and get it!" as he clawed the air like a tiger. Damn, he was right.

    So i ambled in, looked around, found a seat near the guy organizing (Josh Baer, another awesome guy) introduced myself and sat at a table by myself, just waiting for an in...

    Then the gentleman from the lobby came in and sat in front of me, with a big grin.

    Hi?

    Hi.

    You're a part of this?

    Yes, my name's Bob Metcalfe.

    Cool, thanks for the pep talk. So, whats your story?

    Well, I founded 3Com, and helped come up with Ethernet.

    Oh... damn.. cool..

    ...And my life has never been the same since!

    If you read this, thanks Bob.

    > "Metcalfe insists on calling Wi-Fi by its original name, Wireless Ethernet, for old times’ sake."

    Okay, besides all his contributions, I've decided this guy is my favorite for that alone. Imagine if he was your (great?) uncle and you're on a family vacation together. "What's the Wi-Fi password here, Bob?"

    Bob: "What's the what now?"

    You: "Excuse me. What's the Wireless Ethernet password?"

    Bob: "Oh, it's HotelGuest2023"

    The archetypal "good enough" solution:

    Instead of preventing collisions, tolerating and managing them.

    I think of Ethernet often when assessing how close to perfection I need to get in my work.


    It is also lesson of doing something now and rewriting it later. For example no modern ethernet network uses cd/csma anymore and it was pretty iconic part of original ethernet. Overall ethernet on physical layer has seen quite an evolution from coax and vampire taps, to twisted pair and hubs, to switched networks, and nowdays wireless, single-pair, optical, and virtual networks

    You left out a step: ThinNet coax, without vampire taps!

    That's what was at 3Com when I joined in 1985. I even have a section in The Big Bucks where I took down the entire company for a few seconds by disconnecting the coax. No one noticed.


    Ethernet is also an example of a tech that has an easy scaling path: hubs with switched uplink ports made it really easy to divide collision domains. In the early days before everything was switched you could instantly reduce collision losses with a little bit of hardware in the server closet with no other changes to the network.


    I remember when hubs were still common; I don't know if any have been made for decades. Even bargain basement switches are switched now, and often even have spanning tree and other 'previously enterprise' features.

    Hubs max out at 100Mbps. Everybody today is using Gigabit, so they're effectively extinct.

    Even at 100Mbps hubs were on the way out. They were pretty hacky. The hardware had two different hubs internally and joined them together with a bit of logic, but that logic was somewhat failure prone and it was common to have 10/100 hubs where the 10 clients couldn't talk with the 100 clients and vice versa. Autodetection was at best a roll of the dice so most people wired down their port settings instead. Everybody hated them and switches got cheap real fast so they didn't last very long. The only thing they were good for was network diagnostics.

    > The only thing they were good for was network diagnostics.

    Indeed - I still have a couple that I used for packet sniffing. Thankfully managed switches or switches smart enough to support port mirroring are inexpensive and thus fairly ubiquitous now.


    True, but it did detect transmission in progress (carrier sensing) which helped to avoid collisions in the first place.

    Ethernet was always inefficient, with a crazy amount of unused legacy space reserved in an unnecessarily large header. CSMA/CD for contention was one of the ugliest medium access solutions imaginable. The coax implementation needing termination plugs was also ugly. Its advantage was cost, having had no license fees, making it suited to consumer/commercial applications driving economies of scale. It's the VHS of datacomms.

    It's evolved, thankfully, but it remains an ugly, inefficient standard that only has life because of its legacy. And it's been increasingly jimmied into professional, carrier applications for which it was never intended and where far superior, though more expensive solutions already existed.

    That's not to say its creators don't deserve credit. It did its job well enough for its early days. But that's why this award comes too late. Because now Ethernet is the bloated, inelegant dinosaur we've built an ecosystem around, but to admire it is to forget the competitors it drove to extinction along the way.

    It was a lot less ugly than whatever else passed for networking standards at the physical level in those days.

    Arcnet, Twinax, Token Ring and so on, I've probably used them all, and at scale. Compared to Ethernet they all sucked, besides being proprietary they were slow, prone to breaking in very complex to troubleshoot ways (though ethernet had its own interesting failure modes in practice it was far more reliable), and some used tons of power which made them unusable for quite a few applications. On top of that it was way cheaper and carried broad support from different vendors, which enabled competition and helped to improve it and keep prices low.

    Oh good heavens. Arcnet! When I first learned about writing Linux device drivers, it was trying to get a decent driver for some Arcnet cards that the company I worked at as using in some client installations. Can't remember exactly why we never completed it (well, yea I do. Ethernet worked better, a lot better) but since we never "released the product" they never let us send in the driver we did write to the kernel mailing list. That was in the kernel 1.x days.

    Now, I feel old. Time for a nap.

    Ethernet evolved in backward compatible way for more than 30 years. If we would design a new standard from scratch to fit the same use cases we in theory can learn from the experience and improve things but at the same it would be hard to resist a temptation to make it future-proof by adding a lot of things just in case and this new standard likely will be even more wasteful. And having opportunity doesn't mean it will be used. I often see new design make mistakes avoided in older designs because people have limited time to learn and body of knowledge is too large to always successfully learn from the past.

    Also hardware is not like software where you can rewrite a site using a JS framework of the day every few years. Compatibility is really important.


    You could not be more wrong! Efficiency and overhead are measured as a percent of frame size and 128-byte packets (X.25) or 48-byte frames (Atm) are abortions. 1500 bytes at the outset and the overhead is < 1% and < 0.2% with jumbograms (8kB). Every 802.11 standard is a superset of Ethernet and that makes DIX Ethernet the most scalable network protocol of all time!


    Do you mean subset? It was first standardized as 802.3. Contention under CSMA/CD meant it was not scalable - as in it became inefficient as the segment grew. But you're right and I stand corrected in sense of the header/frame length ratio. I'd edit that first sentence if I still could.


    What are some superior competing standards, and could they be implemented in a royalty-free way?

    The point was more about competing technical choices made by designers, rather than the choice of standards made by consumers. For example TDMA can be arguably more scalable, bandwidth and energy efficient than CSMA/CD and can give consistent PL, PD and PDV, so might have even allowed early business grade voice. Variable header sizes would have allowed efficient use over bandwidth constrained media like radio. But the low cost and fast success of Ethernet formed a barrier to entry for competing LAN standards, where those arguably better technical choices may have found a footing.

    They eventually found application in other non-LAN standards, so guess royalties weren't an issue.

    None of those things could be implemented in the 1970 or 1980s at reasonable cost so they're not actually solutions at all.

    Hell even making Ethernet fully switched didn't really happen until the 1990s thanks to Moore's law making the ASICs cheap enough.

    Without mass adoption there's no reason to invest. Look at Token Ring, Ethernet's only real competitor at scale: it quickly started to lag behind. Ethernet shipped 100Mbps several years before Token Ring. The 1Gbps Token Ring standard was never put to hardware.


    TDMA is an extension of TDM, which goes back to the 60's. Synchronization was already solved. Variable header size could be implemented with the same preamble concept already used by Ethernet, but used to indicate the end of the header. These were not hard problems. The technology existed, the affordability would have largely depended on adoption, so it's hard to say.


    We'll have to agree to disagree. Obviously TDM was known but implementing it for ethernet at a reasonable cost was just not an option at the time (in my opinion).

    TDMA needs time synchronization and thus becomes more complex.

    Even in telecoms the packet switched connections are quickly replacing synchronous time division connections.


    The recent-ish 10base-t1 uses something called PLCA instead of CSMA/CD which doesn't require time synchronization, and gives each node in a subnet a dedicated transmission slot.

    10Base-T1L is point to point, 10Base-T1S is multidrop, but very limited in nodes and how long the branches/stubs can be.

    We'll see how it actually performs in field. Microchip seems to be in the T1S boat and TI+AD in T1L.

    the best tech doesn't always win, and in fact the "best" tech is typically promoted by people who focus more on the tech and less on go-to-market and competitive strategy. And thus, the "best" tech often loses to the tech that (for example) is better packaged or promoted.

    Python is a nice example: inelegant language with many deep flaws, but easy syntax and "batteries included" won the day.

    That's just not true. Shoddy builders put quality builders out of business all the time.

    Guess it depends on your faith in markets and your definition of superior.

    My memory is that every ring topology had pretty nasty failure characteristics around "a single misbehaving/failing client."

    Which Ethernet has too, but can generally tolerate a much higher level of imperfect reality, while still providing degraded service.

    Before you could get plentiful high-quality NICs and cabling, graceful degradation was a killer feature.

    "Captain Bob" we called him, at 3Com.

    In "The Big Bucks" I have two quotes from him, which he graciously allowed me to use as something he would have said (they're not very exciting). Normally I never have a real person appear and do anything; at most people speak of them in the third person.

    In "Inventing the Future" I have the 1978 story about the lightning strike that took down the Ethernet between PARC and SDD. Bob had actually forgotten it, but he remembered the second lightning strike that helped sell Ethernet, because Ron Crane (RIP) had remembered the first one and engineered the Ethernet card to withstand them. As luck would have it, during a competition there actually was a lightning strike, and 3Com's survived it while the competitor's didn't.

    Congrats Bob!

    If anyone's interested in the history of the early internet, I recently read the book "Where Wizards Stay Up Late" by Katie Hafner and it is a very interesting read about how we went from ARPA to WWW, including a lot of the warts you associate with large scale projects like ARPANet grew into (and the book features Metcalfe quite extensively when talking about Ethernet and ALOHAnet).

    Honestly, it's nice to see technology like ethernet, which is both "as simple as it should be but no simpler", and has also stood the test of time get recognized and rewarded!

    Now if we could only break away from the frame size limit and have working jumbo frames without a lot of pain.

    Having millions of packets per second is starting to get a bit ridiculous. Even 10G is still challenging, not to speak of 100G.


    It's more a personal account that's 1 year old which opens with a tweet on cancel culture followed by his political leanings, cryptocurrency posts, and an overwhelming amount of basketball stats. It does have the occasional post about his involvement in geothermal energy though but beyond that following the account isn't going to get you any content he's known and respected for.


    If he was tweeting anecdotes, or still working (seems he might be?) and tweeting about it it'd probably be a lot more - but it's mostly US basketball (and other sports/personal stuff) by the looks of it, so however veritable a hero it's just not interesting to the same audience (or at least, for the same reason, of course some of us will be basketball fans) - and probably if you're big into 'basketball Twitter' he's ~'nobody'.


    This is like when I heard Roger Penrose won a Nobel Prize in 2020 and I thought for a second "wait is this his second? What? You mean he hadn't been awarded one until now? Who was in line ahead of him and for what?"

    Reading the original ethernet paper was one of my favorite moments in college. Just a brilliantly pragmatic design (especially handling packet collisions with randomized retransmissions).

    Made me appreciate how important it is for something to be simple and pragmatic, rather than over-engineered to perfection.


    I think in part what you are witnessing there is the power of a single well informed individual over a committee, which is how the competition was doing it.

    Somewhat related:

    The choice of 48 bits for the hardware/station address seems to have been a pretty good choice: it's been 40+ years and we still have no run out. I'm curious to know if anyone has done the math on when Ethernet address exhaustion will occur.

    While the Ethernet frame has been tweaked with over the decades, addressing has been steady. Curious to know if any transition will ever been needed and how would that work.

    In hindsight, IP's (initial) 32 bit address was too small, though for a network that was (primarily) created for research purposes, but ended up escaping 'into the wild' and accidentally becoming production, it was probably a reasonable choice: who expected >4 billion hosts on an academic/research-only network?


    We're unlikely to ever actually run out. Ethernet addresses are expected to be universally unique, but they're only required to be unique within a collision domain. If someone started reusing addresses from 3c503s, chances are high nobody would notice. If we did run out, devices would need to start generating randomized addresses, and maybe probe for collisions, which isn't unworkable; the number of nodes in a collision domain tends to be low, and the space is large, you might only barely need to to probe for collisions at all if you have a good random source.

    Some quick napkin math on the current MAC vendors database: 46 bits of a MAC address are reserved for universally administered unicast (i.e. a globally unique MAC assigned to identify a device). So far we have assigned ~570 billion addresses via 24/28/36 bit range assignments for the same purpose which represents a little under 1% of the space. So nothing urgent, though if we stuck with Ethernet as much as we use it today then in <100 years I wouldn't be surprised if we were "out".

    At the same time there are also 46 bits of locally administered unicast addresses and, unlike IP, Ethernet addresses only care about the local network (and this isn't a "because we've co-opted them to to save space just like NAT broke IP protocols" rather the design intent of Ethernet). Even if you had 10 billion LANs with 100 devices each and they all used this random non-unique assignment there would only be a ~50% chance there one or more devices would have a collision.

    The only real advantage I've ever been able to find of programming in unique MAC addresses vs random MAC addresses you can look up what company the MAC was assigned to. It may seem like there is a risk random assignment can be done poorly (e.g. not very randomly) but honestly the same risk exist with assigned ranges as seen by network vendors cheaping out and re-using their MAC blocks (which is significantly more likely to conflict than if they just used random locally administered addresses in the first place).

    So when XSIS (Xerox System Integration Standards?) first started selling MAC address blocks in the early 1980s, I think it cost $1k for 16M mac addresses (168 addresses per penny). So there are only 16M blocks of mac addreses available, and it IS possible to run out, if vendors waste the addresses. I don't know what registration costs from The IEEE right now, but once the equivalent of $16B is spent on mac addresses, we will run out of blocks.

    Maybe today they are selling smaller blocks but the MAC is basically divided into 16M blocks (specified by the first 24 bits / 3 bytes) - each with a registered owner or "unassigned") and 16M MACs within the block (the lower 24 bits). That's why you can enter a MAC address into a lookup website and find out who made the NIC (google search "mac address lookup"):

    https://aruljohn.com/mac.pl


    It was Xerox & Yogen Dalal's choice, not Bob's choice! Xerox blew it with 8-bit station addresses in PuP (PARC Universal Protocol) and wanted to give each station a UID to break ties in database transactions, hence the 48-bits. XNS actually had 3-byte station and 3-byte network size to fit in 6-byte MAC addresses! Metcalfe is not a software engineer and wouldn't have these insights ...


    I’ve got his book Packet Communication, and the acknowledgements ends with “Don’t let the bastards get you down.”


    Is this supposed to be an Eastern Egg? I have found the book to look for the context of the statement but I have not found anything interesting after the acknowledgements.


    Ah, the best and most readily available source of makeshift jumper wires. Truly an amazing contribution even in ways it wasn't quite designed to be :)

    ACM Turing award committee has its head up it's ass? Seriously 25% of winners have NO impact on the field ...

    Metcalfe was controversial because Alohanet from university of Hawaii pioneered the idea and Metcalfe was seen as writing a nice proof in CACM of the 1/e capacity breakdown theorem and popularizing an already extant technology. He did not build it alone Chuck Thacker probably built most of it but didn't have a PhD! Oh the horror!

    He should not have gotten it now - either give it sooner or not at all - and he should not be the only one getting it!


    I hear the Matcalfe has been spending some time recently at the MIT Julia lab working on climate issues.

    The fact that it allowed for all kinds of topologies, and that it served as a bus (shared medium, hence the name 'Ether') rather than a point-to-point link is what I think made the biggest difference.

    Of course now that we all use switched links they are point-to-point again but an ethernet 'hub' gave you the same effect as a bus with all devices seeing all of the traffic. This made efficient broadcast protocols possible and also allowed for a historical - but interesting - trick: the screw-on taps that you could place on a single coaxial cable running down a department giving instant access without pulling another cable. Zero network configuration required, just get the tap in place and assign a new address. DHCP later took care of that need as well.

    This was fraught with problems, for instance a single transceiver going haywire could make a whole segment unusable and good luck finding the culprit. But compared to the competition it absolutely rocked.

    To build on your comment, although it's been years since I studied Ethernet in depth...

    - (On the bus thread) Ethernet started from an assumption of bad behavior (out of spec cabling, misbehaving clients, etc.) and tighten requirements just enough to construct a useful network. Much better balance between de facto ruggedness vs performance than its peers.

    - From the beginning, Ethernet reasoned that it was cheaper to put logic in purpose-built networking hardware than endpoints (i.e. PC network adapters). This was a better scaling tradeoff. 1x $$$ network device + 100x $ client adapter vs 1x $$ networking device + 100x $$ client adapter.

    - Because of the above, you started to get really cost- and data-efficient networks when the cost of Ethernet switches plummeted. (Remember, in early Ethernet days, networks were hub/broadcast-only!)

    Ethernet switches are actually pretty complex things, when you think about it. They have to learn what MAC addresses are behind each port, and build a complex forwarding table and do table lookups in real time. The larger the switch, the more complex it is. Its hard to make it scale.

    Around the same era, Myrinet switches with higher bandwidth (1.2Gb/s if I remember correctly) and higher density at a fraction of the port cost of slower ethernet switches. This was possible because the Myrinet switches were dumb.. The Myrinet network elected a "mapper" that distributed routes to all NICs. The NICs then pre-pended routing flits to the front of each packet. So to forward a packet to its destination, all a Myrinet switch had to do was strip off and then read the first flit, see that it said "exit this hop on port 7", and then forward it on to the next switch. Higher densities were achieved with multiple chips inside the cabinet.

    In the mid 2000s we even built one of what was, at the time, one of the worlds largest ethernet switches using (newer, faster Myrinet) internally, and encapsulating the ethernet traffic inside Myrinet. That product died due to pressure from folks that were our partners, but felt threatened by our incredibly inexpensive high density switches.

    https://www.networkworld.com/article/2323306/myricom-rolls-o...

    EDIT: fixed routing flit description, added link to PR

    Sounds similar to Infiniband where each subnet has a subnet manager which calculates routing tables for the entire subnet, and assigns 16-bit local identifiers (LID) so you stations don't need to use the full 16 byte GUID's.

    Also Infiniband packets are power of two sized, making fast switching easier.

    Neat! (Re: you and parent)

    At their core, most hardware evolutions seem like optimizing compute:memory:storage:functionality vs the (changing) current state of the art/economy.

    When Ethernet was first released, compute was expensive. Made sense to centralize compute (in routers) and make everything else dumb-tolerant.

    Now, compute is cheap and plentiful at network-calculating scales and throughout expectations are very high, so it makes sense to burn compute (including in clients) to simplify routing hardware.

    Ha! But they delivered that much value (or more), so the market supported the price until supply flooded.

    We could do worse for a transformative technology ranking metric than "How overpriced was this when first released?" (significant look at Nvidia cards)

    I had a bunch of workloads that quite literally got cut down to about 15% or so of the original runtime (a cluster compressing a whole archive of CDs for a large broadcaster) so I happily paid up. But still... $1000 / port!!

    And here I have sitting next to me a 48 port gigabit switch that cost 15% of what that 100 megabit switch cost in 1996 or so. Iirc it was one of the first D-link products I bought, it definitely wasn't great (it ran pretty hot) but it worked quite well enough. Amazing progress.

    And you can get switches for less than $25 per 10gb port now.

    Of course the jump from 10mb hub to 100mb switch was much larger than any of the later jumps, just because of the reduced noise.


    for years i was carrying around an ethernet splitter that would allow me to connect two devices into one ethernet port. i last used it some 10 years ago in a place without wifi


    Yeah, it’s a very cool trick that surprises a lot of people when they learn that only half the wires are used.


    Not for gigabit ethernet and good luck picking up the pieces if you find yourself splitting a power-over-ethernet setup :)


    using a splitter is usually a temporary solution, and i am unlikely to be sharing a port with a PoE device. nor do i care about gigabit speed when the only reason to use a splitter is to make up for missing wifi.


    And it was sniffing heaven. Only paralleled by the brief period of nobody using any serious encryption on their wifi.


    Where "brief" was about 10 years, which at the time was about 25% of all time that networks were common.


    At the time, maybe. Eventually it will be remembered as a short glitch in tech history.


    Yup! A whole other real-time industrial protocol called EtherCAT has been built on top of the same hardware.

    Original DIX Ethernet was standardized by my manager, David Redell of Xerox. It was the bare minimum to do the job, 6-byte station destination, 6-byte source address, 2-byte packet length, a 2-byte Ethertype field (the latter 2 were combined for networks with hardware framing), and 32-bit CRC. NO arc in the hardware. It leveraged the move to byte-based memories and small CPUs. It followed the end-to-end principle in system design just about optimally - the most minimal MAC design of all time. EASY TO BUILD UPON AND ENHANCE.

    Ethernet (CSMA/CD) is a protocol that copies human speech patterns. After someone stops speaking people hear the quiet (carrier sense multiple access / CSMA) and wait a very short and randomized amount of time and begin to speak. If two speakers collide they hear the collision and shut up (CD - collision detection). They both pick a randomized amount of time to pause before trying again. On the second third etc. collision people wait longer and longer before retrying.

    The thing about original ethernet (1981) is that it wastes 2/3 of the channel because a highly loaded channel has too many collisions and too many back offs. But deployment and wiring were expensive so running a single wire throughout a building was the cheapest possible way to start (enhanced by thinwire Ethernet and twisted pair to have a less bulky cable a few years later). The frame design was PERFECT and within ~10 years people were using ethernet frames to build switched networks and today only radio networks are CSMA/CD = Ethernet.

    I was in Xerox SDD in the early 80’s I have lots of memories dealing with the large coax taps which we in the ceiling.

    I also remember setting up a Star demo at the NCC and someone forget coax cable terminators (or was short one terminator?) which was causing reflectance issues with the signal which was solved by cutting the cable to a precise length to get the demo working.


    In the original ethernet design, routers were not used (because 8-bit processors were too slow and 16-bit processors were just starting to emerge). So the original standard proposed repeaters as the way to extend a large network, and this was a very very cheap analog way to grow your network. It was quite common to have a whole building or even several nearby buildings on one ethernet and then a high speed (i.e. 56 Kbps or maybe even a T-1) link to other buildings either nearby or in other cities.


    Maybe you know, why isn't the CRC at the end? Then you could stream the packet instead of needing to construct it and then go back to the header to write the CRC.


    Back when I was doing a lot of ethernet driver work, I joked to colleagues about what I'd do if I had a time machine. Go back and kill Hitler? no. Go back and stop John Wilkes Booth from shooting Lincoln? No. I'd go back and convince Bob Metcalfe to make ethernet headers 16 bytes rather than 14 to avoid all sorts of annoying alignment issues


    Yeah those alignment issues surely have killed more jews than that Hitler guy. /s


    Lol. No, its more like everybody will line up take care of those more important things when they get access to a time machine, but when I get access to a time machine, I want to take care of my pet peeve :)


    While you're back there, convince people to do IP truncation rather than fragmentation. Truncation would probably be a lot more useful at lower cost than fragmentation, and maybe path MTU problems wouldn't still be an issue. *grumble*grumble*

    I feel extremely to have attended one of his keynotes in one network conference once and for the quick opportunity to greet him.

    Well deserved.

    If only USB was half as reliable as Ethernet ...

    The anti-Turing award goes to the inventors of USB.


    Around 2002 he had the wild idea about Ethernet for intra-bus and peripheral communication. Nobody in the room thought it was a good idea. Glad he was smart enough to abandon that idea and stick with networking. I didn't want my mouse and keyboard getting an IP address.


    Wifi would have to go to Steve Jobs :-) Lucent was sitting on 802.11 (WaveLAN) for ten years selling super expensive products targeting niche markets and it took Apple to move things forward. More in “Oral History of Arthur “Art” Astrin”, wifi pioneer: https://youtu.be/Tj5NNxVwNwQ

    From 1990, the 802.11 standards body was gyrating on radio-based 802.11 ideas.

    That body would not even have existed without Ethernet.


    It coming to Metcalfe for Ethernet 7 years after to Berners-Lee for the WWW is amusing though.

    ACM awards are dominated by the theory community. A lot of theoreticians with NO impact on the world have awards. Metcalfe was one of a dozen people who co-invented Ethernet and does not fit the historians "Great man" theory where history is decided by a few "Great Men" who went a different direction at a critical moment ... Ethernet's success is only 25% due to him.

    For example, in 1979 at UIUC a grad student built 230kbps S-100 cards using rs232 chips and I wrote the Z-80 csma/cd drivers (as a high school student) so it was not rocket science.

    So there was reluctance to give him an award for something he didn't pioneer all alone.


    This just shows what a huge joke the Turing Award process is! He should have gotten this award by 2000 or never at all! But the committee was too busy giving out awards for writing sexy sounding papers about stoplight verification and zero knowledge proofs to honor someone who disrupteded the whole field!


  3. Concurrency Freaks: 50 years later, is Two-Phase Locking the best we can do?

    Two phase locking (2PL) was the first of the general-purpose Concurrency Controls to be invented which provided Serializability. In fact, 2PL gives more than Serializability, it gives Opacity, a much stronger isolation l ... (12997 chars. See body)


    Two phase locking (2PL) was the first of the general-purpose Concurrency Controls to be invented which provided Serializability. In fact, 2PL gives more than Serializability, it gives Opacity, a much stronger isolation level.
    2PL was published in 1976, which incidentally is the year I was born, and it is likely that Jim Gray and his buddies had this idea long before it was published, which means 2PL first came to existence nearly 50 years ago.
    Jim Gray Endowed Scholarship | Paul G. Allen School of Computer Science &  Engineering

    After all that time has passed, is this the best we can do?

     Turns out no, we can do better with 2PLSF, but let's start from the begining

    When I use the term "general-purpose concurrency control" I mean an algorithm which allows access multiple objects (or records or tuples, whatever you want to name them) with an all-or-nothing semantics. In other words, an algorithm that lets you do transactions over multiple data items.
    Two-Phase Locking has several advantages over the other concurrency controls that have since been invented, but in my view there are two important ones: simplicity and a strong isolation level.

    In 2PL, before accessing a record for read or write access, we must first take the lock that protects this record.
    During the transaction, we keep acquiring locks for each access, and only at the end of the transaction, when we know that no further accesses will be made, can we release all the locks. Having an instant in time (i.e. the end of the transaction) where all the locks are taken on the data that we accessed, means that there is a linearization point for our operation (transaction), which means we have a consistent view of the different records and can write to other records all in a consistent way. It doesn't get much simpler than this.
    Today, this idea may sound embarrassingly obvious, but 50 years ago many database researchers thought that it was ok to release the locks after completing the access to the record. And yes, it is possible to do so, but such a concurrency control is not serializable.

    As for strong isolation, database researchers continue to invent concurrency controls that are not serializable, and write papers about it, which means Serializability is not that important for Databases. On the other hand, all transactional commercial databases that I know of, use 2PL or some combination of it with T/O or MVCC.

    Moreover, in the field of concurrency data structures, linearizability is the gold standard, which means 2PL is used heavily. If we need to write to multiple nodes of a data structure in a consistent way, we typically need something like 2PL, at least for the write accesses. The exception to this are lock-free data-structures, but hey, that's why (correct) lock-free is hard!

    Ok, 2PL is easy to use and has strong isolation, so this means we're ready to go and don't need anything better than 2PL, right?
    I'm afraid not. 2PL has a couple of big disadvantages: poor read-scalability and live-lock progress.

    The classic 2PL was designed for mutual exclusion locks, which means that when two threads are performing a read-access on the same record, they will conflict and one of them (or both) will abort and restart.
    This problem can be solved by replacing the mutual exclusion locks with reader-writer locks, but it's not as simple as this.
    Mutual exclusion locks can be implemented with a single bit, representing the state of locked or unlocked.
    Reader-writer locks also need this bit and in addition, need to have a counter of the number of readers currently holding the lock in read mode. This counter needs enough bits to represent the number of readers. For example, 7 bits means you can have a maximum of 128 threads in the system, in case they all decide to acquire the read-lock on a particular reader-writer lock instance.
    For such a scenario this implies that each lock would take 1 byte, which may not sound like much, but if you have billions of records in your DB then you will need billions of bytes for those locks. Still reasonable, but now we get into the problem of contention on the counter.

    Certain workloads have lots of read accesses on the same data, they are read-non-disjoint. An example of this is the root node of a binary search tree, where all operations need to read the root before they start descending the nodes of the tree.
    When using 2PL, each of these accesses on the root node implies a lock acquisition and even if we're using read-writer locks, it implies heavy contention on the lock that protects the root node.

    Previous approaches have taken a stab at this problem, for example TLRW by Dave Dice and Nir Shavit in SPAA 2010.
    By using reader-writer locks they were able to have much better performance than using mutual exclusion locks, but still far from what the optimistic concurrency controls can achieve.
    Take the example of the plot below where we have an implementation similar to TLRW with each read-access contending on a single variable of the reader-writer lock, applied to a binary search tree, a Rank-based Relaxed AVL. Scalability is flat regardless of whether we're doing mostly write-transactions (left side plot) or just read-transactions (right side plot).

    Turns out it is possible to overcome this "read-indicator contention" problem through the usage of scalable read-indicators. Our favorite algorithm is a reader-writer lock where each reader announces its arrival/departure on a separate cache line, thus having no contention for read-lock acquisition. The downside is that the thread taking the write-lock must scan through all those cache lines to be able to ascertain whether if the write-lock is granted, thus incurring a higher cost for the write-lock acquisition.
    As far as I know, the first reader-writer lock algorithms with this technique were shown in the paper "NUMA Aware reader-writer locks" of which Dave Dice and Nir Shavit are two of the authors, along with Irina Calciu, Yossi Lev, Victor Luchangco, and Virenda Marathe
    This paper shows three different reader-writer lock algorithms, two with high scalability, but neither is starvation-free.

    So what we did was take some of these ideas to make a better reader-writer lock, which also scales well for read-lock acquisition but has other properties, and we used this to implement our own concurrency control which we called Two-Phase Locking Starvation-Free (2PLSF).
    The reader-writer locks in 2PLSF have one bit per thread reserved for the read-lock but they are located in their own cache line, along with the bits (read-indicators) of the next adjacent locks.


    Like on the "NUMA-Aware reader-writer locks" paper, the cost shifts to the write-lock acquisition which needs to scan multiple cache lines to acquire the write-lock. There is no magic here, just trade-offs, but this turns out to be a pretty good trade-off as most workloads tend to be on the read-heavy side. Even write-intensive workloads spend a good amount of time executing read-accesses, for example, during the record lookup phase.
    With our improved reader-writer lock the same benchmark shown previously for the binary search tree looks very different:

    With this improved reader-writer lock we are able to scale 2PL even on read-non-disjoint workloads, but it still leaves out the other major disadvantage, 2PL is prone to live-lock.

    There are several variants of the original 2PL, some of these variants aren't even serializable, therefore I wouldn't call them 2PL anymore and won't bother going into that.
    For the classical 2PL, there are three variants and they are mostly about how to deal with contention. They're usually named:
        - No-Wait
        - Wait-Or-Die
        - Deadlock-detection

    When a conflict is encountered, the No-Wait variant aborts the self transaction (or the other transaction) and retries again. This retry can be immediate, or it can be later, based on an exponential backoff scheme. The No-Wait approach has live-lock progress because two transactions with one attempting to modify record A and then record B, while the other is attempting to modify record B and then record A, may indefinitely conflict with each other and abort-restart without any of them ever being able to commit.

    The Deadlock-Detection variant keeps an internal list of threads waiting on a lock and detects cycles (deadlocks).
    This is problematic for reader-writer locks because it would require each reader to have its own list, which itself needs a (mutual exclusion) lock to protect it. And detecting the cycles would mean scanning all the readers' lists when the lock is taken in read-lock mode.
    Theoretically it should be possible to make this scheme starvation-free, but it would require using starvation-free locks, and as there is no (published) highly scalable reader-writer lock with starvation-free progress, it kind of defeats the purpose. Moreover, having one list per reader may have consequences on high memory usage. Who knows, maybe one day someone will try this approach.

    The Wait-Or-Die variant imposes an order on all transactions, typically with a timestamp of when the transaction started and, when a lock conflict arises, decides to wait for the lock or to abort, by comparing the timestamp of the transaction with the timestamp of the lock owner. This works fine for mutual exclusion locks as the owner can be stored in the lock itself using a unique-thread identifier, but if we want to do it for reader-writer locks then a thread-id would be needed per reader.
    If we want to support 256 threads then this means we need 8 bits x 256 = 256 bytes per reader-writer lock. Using 256 bytes per lock is a hard pill to swallow!

    But memory usage is not the real obstacle here. The Wait-Or-Die approach implies that all transactions have a unique transaction id so as to order them, for example, they can take a number from an atomic variable using a fetch_and_add() instruction.
    The problem with this is that on most modern CPUs you won't be able to do more than 40 million fetch_and_add() operations per second on a contended atomic variable. This may seem like a lot (Visa does about 660 million transactions per day, so doing 40 million per second sounds pretty good), but when it comes to in-memory DBMS it's not that much, and particularly for concurrent data structures is a bit on the low side.
    Even worse, this atomic fetch_and_add() must be done for all transactions, whether they are write-transactions or read-transactions.
    For example, in one of our machines it's not really possible to go above 20 M fetch_and_add() per second, which means that scalability suckz:

    To put this in perspective, one of my favorite concurrency controls is TL2 which was invented by (surprise!) none other than Dave Dice, Nir Shavit and Ori Shalev
    I hope by now you know who are the experts in this stuff  ;)

    Anyways, in TL2 the read-transactions don't need to do an atomic fetch_and_add(), and they execute optimistic reads, which is faster than any read-lock acquisition you can think of. At least for read-transactions, TL2 can scale to hundreds of millions of transactions per second. By comparison, 2PL with Wait-Or-Die will never be able to go above 40 M tps/sec.
    This means if high scalability is your goal, then you would be better off with TL2 than 2PL… except, 2PLSF solves this problem too.

    In 2PLSF only the transactions that go into conflict need to be ordered, i.e. only these need to do a fetch_and_add() on a central atomic variable. This has two benefits: it means there is less contention on the central atomic variable that assigns the unique transaction id, and it means that transactions without conflicts are not bounded by the 40 M tps plateau.
    This means that we can have 200 M tps running without conflict and then 40 M tps that are having conflict, because the conflicting transactions are the only ones that need to do the fetch_and_add() and therefore, and the only ones bounded by the maximum number of fetch_and_adds() the CPU can execute per second.
    On top of this, the 2PLSF algorithm provides starvation-freedom.

    Summary

    In this post we saw some of the advantages and disadvantages of 2PL and some of the variants of 2PL.
    We explained what it takes to scale 2PL: make a better reader-writer lock.
    But the big disadvantage of 2PL is the live-lock progress, which some variants could seemingly resolve, but in practice they don't because they will not scale, even with a better reader-writer lock.
    Then we described 2PLSF, a novel algorithm invented by me, Andreia and Pascal Felber to address these issues.

    In summary, 2PLSF is what 2PL should have been from the start, a concurrency control that scales well even when reads are non-disjoint and that provides starvation-free transactions, the highest form of blocking progress there is.
    Moreover, it's pretty good a solving certain kinds of conflicts, which means it can have scalability even some conflicts arise. 2PLSF is not perfect, but it's good enough, certainly better than TL2 when it comes to solving conflicts.

    Despite being two-phase locking, it's as close to 2PL as a jackhammer is to a pickaxe. 

    2PLSF is not your grandfather's 2PL



Random expired ±

2 links selected from 412 expired links

Expired ±

412 links expired today Thu, Jan.18, 2024
  1. In Praise Of Anchovies: If You Don’t Already Love Them, You Just Haven’t Yet Discovered How Good They Can Be

    For many people, anchovies are one of those foods to be avoided like the plague. But for Ken Gargett anchovies are not a love-it-or-hate it food. Rather, they are a love-it-or-you-have-not-discovered-how-good-they-can-be (Open link)

  2. The Kaizen Way

    Are you looking for a new approach to health? Do you want to finally get the results you have been hoping for? How do you find a practitioner that is willing to try a different approach and guide you through your journey (Open link)

  3. Diátaxis

    The Diátaxis framework solves the problem of structure in technical documentation, making it easier to create, maintain and use. (Open link)

  4. Let's talk SkipList

    BackgroundSkipLists often come up when discussing “obscure” data-structures but in reality they are not that obscure, in fact many of the production grade softwares actively use them. In this post I’ll try to go into Ski (Open link)

  5. Owen Pomery

    OWEN D. POMERY WORK SHOP ABOUT + CONTACT EDITORIAL RELIEFS EDITION AIRBNB FLAT EYE ERNEST POHA HOUSE CONCEPT SENET VICTORY POINT KIOSK SCI-FI SPOT ILLUSTRATIONS GAME OF THRONES (GAME) NARRATIVE ARCHIT (Open link)

  6. Shields Down

    Resignations happen in a moment, and it’s not when you declare, “I’m resigning.” The moment happened a long time ago when you received a random email from a good friend who asked, “I know you’re really happy with your cu (Open link)

  7. The Grug Brained Developer

    Introduction this collection of thoughts on software development gathered by grug brain developer grug brain developer not so smart, but grug brain developer program many long year and learn some things although mostly s (Open link)

  8. You and your mind garden

    In French, “cultiver son jardin intérieur” means to tend to your internal garden—to take care of your mind. The garden metaphor is particularly apt: taking care of your mind involves cultivating your curiosity (the seeds (Open link)

  9. What's in a Good Error Message?

    In a way, an error message tells a story; and as with every good story, you need to establish some context about its general settings. For an error message, this should tell the recipient what the code in question was tr (Open link)

  10. Maintaining a medium-sized Java library in 2022 and beyond

    scc --exclude-dir docs/book ─────────────────────────────────────────────────────────────────────────────── Language Files Lines Blanks Comments Code Complexity ──────────────────────────────── (Open link)

  11. The Book

    This is the story of Simon Wardley. Follow his journey from bumbling and confused CEO lost in the headlights of change to someone with a vague idea of what they're doing. (Open link)

  12. A Better Way to Manage Projects

    The GOVNO framework is a novel approach to project management that aims to improve upon the shortcomings of the popular scrum methodology. Each letter of the acronym represents a key aspect of the framework: G: Governan (Open link)

  13. User space

    For the term "user space" as used in Wikipedia, see Wikipedia:User pages. "Kernel space" redirects here. For the mathematical definition, see Null space. This article needs additional citations for verification. Please h (Open link)

  14. here

    Sample code and instructions for steps through different container image build options. - GitHub - maeddes/options-galore-container-build: Sample code and instructions for steps through different container image build op (Open link)

  15. Convey

    2022 February 01 16:21 stuartscott 1473754¤ 1240149¤ You may have noticed that the January edition of the Convey Digest looks a little different from the previous ones - the color scheme is now based on the dominant (Open link)

  16. Top 10 Architecture Characteristics / Non-Functional Requirements with Cheatsheet | by Love Sharma | Jun, 2022 | Dev Genius

    Imagine you are buying a car. What essential features do you need in it? A vehicle should deliver a person from point A to point B. But what we also check in it is Safety, Comfort, Maintainability… (Open link)

  17. Manhattan Phoenix review: epic history of how New York was forged by fire – and water | Books | The Guardian

    Daniel Levy pins the great fire of 1835 as the birth event of modern Manhattan in a tale as teeming as the city itself (Open link)

  18. Optimizing Distributed Joins: The Case of Google Cloud Spanner and DataStax Astra DB | by DataStax | Building the Open Data Stack | Medium

    In this post, learn how relational and NoSQL databases, Google Cloud Spanner and DataStax Astra DB, optimize distributed joins for real-time applications. Distributed joins are commonly considered to… (Open link)

  19. https://rust-book.cs.brown.edu/

    Welcome to the Rust Book experiment, and thank you for your participation! First, we want to introduce you to the new mechanics of this experiment. The main mechanic is quizzes: each page has a few quizzes about the pag (Open link)

  20. 7 days ago

    This article has a large gap in the story: it ignores sensor data sources, which are both the highest velocity and highest volume data models by multiple orders of magnitude. They have become ubiquitous in diverse, mediu (Open link)

  21. What's in a Good Error Message? - Gunnar Morling

    In a way, an error message tells a story; and as with every good story, you need to establish some context about its general settings. For an error message, this should tell the recipient what the code in question was tr (Open link)

  22. Beyond Microservices: Streams, State and Scalability

    Gwen Shapira talks about how microservices evolved in the last few years, based on experience gained while working with companies using Apache Kafka to update their application architecture. (Open link)

  23. https://programmerweekly.us2.list-manage.com/track/click?u=72f68dcee17c92724bc7822fb&id=c6a9958764&e=d7c3968f32

    Ever since I started to work on the Apache APISIX project, I’ve been trying to improve my knowledge and understanding of REST RESTful HTTP APIs. For this, I’m reading and watching the following sources: Books. At the mom (Open link)

  24. The Four Innovation Phases of Netflix’s Trillions Scale Real-time Data Infrastructure | by Zhenzhong Xu | Feb, 2022 | Medium

    The blog post will share the four phases of Real-time Data Infrastructure’s iterative journey in Netflix (2015-2021). For each phase, we will go over the evolving business motivations, the team’s unique challenges, the (Open link)